Extract Interesting Skyline Points in High Dimension

نویسندگان

  • Gabriel Pui Cheong Fung
  • Wei Lu
  • Jing Yang
  • Xiaoyong Du
  • Xiaofang Zhou
چکیده

When the dimensionality of dataset increases slightly, the number of skyline points increases dramatically as it is usually unlikely for a point to perform equally good in all dimensions. When the dimensionality is very high, almost all points are skyline points. Extract interesting skyline points in high dimensional space automatically is therefore necessary. From our experiences, in order to decide whether a point is an interesting one or not, we seldom base our decision on only comparing two points pairwisely (as in the situation of skyline identification) but further study how good a point can perform in each dimension. For example, in scholarship assignment problem, the students who are selected for scholarships should never be those who simply perform better than the weakest subjects of some other students (as in the situation of skyline). We should select students whose performance on some subjects are better than a reasonable number of students. In the extreme case, even though a student performs outstanding in just one subject, we may still give her scholarship if she can demonstrate she is extraordinary in that area. In this paper, we formalize this idea and propose a novel concept called k-dominate p-core skyline (Cp). C k p is a subset of skyline. In order to identify Cp efficiently, we propose an effective tree structure called Linked Multiple B’-tree (LMB). With LMB, we can identify Cp within a few seconds from a dataset containing 100,000 points and 15 dimensions.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Efficient Progressive Skyline Computation

In this paper, we focus on the retrieval of a set of interesting answers called the skyline from a database. Given a set of points, the skyline comprises the points that are not dominated by other points. A point dominates another point if it is as good or better in all dimensions and better in at least one dimension. We present two novel algorithms, Bitmap and Index, to compute the skyline of ...

متن کامل

EÆcient Progressive Skyline Computation

In this paper, we focus on the retrieval of a set of interesting answers called the skyline from a database. Given a set of points, the skyline comprises the points that are not dominated by other points. A point dominates another point if it is as good or better in all dimensions and better in at least one dimension. We present two novel algorithms, Bitmap and Index, to compute the skyline of ...

متن کامل

On High Dimensional Skylines

In many decision-making applications, the skyline query is frequently used to find a set of dominating data points (called skyline points) in a multidimensional dataset. In a high-dimensional space skyline points no longer offer any interesting insights as there are too many of them. In this paper, we introduce a novel metric, called skyline frequency that compares and ranks the interestingness...

متن کامل

Finding Superior Skyline Points from Incomplete Data

The skyline query has proven to be an important tool in multi-criteria decision making and search space pruning. A skyline query returns the subset of points from a multidimensional dataset that are not dominated by any other point. Due to its wide applications, skyline query and its variants have been extensively studied in the past. However, skyline computation for incomplete domain, where po...

متن کامل

Link-based Ranking of Skyline Result Sets

Skyline query processing has received considerable attention in the recent past. Mainly, the skyline query is used to find a set of non dominated data points in a multi-dimensional dataset. One of the major drawbacks of the skyline operator is the high cardinality of the result set. Providing the most interesting points of the skyline set (top-k) inherently involves the ranking of the skyline p...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2010